3: Functions

R for faculty, Fall 2024

Jonas Moss

Functions

What is a function?

Mathematics: A function maps each \(x\) in \(A\) to some unique \(y\) in \(B\).

  • Functions in programs are usually like this.
  • The value \(y\) is called the return value.
  • Classical functions such as \(f(x) = x^2 - 1\) are obvious examples, but functions do not need to be described by a formula.
  • In many programming languages, functions are called procedures.

Defining functions

Euclidean norm on \(R^2\). \(||x|| = \sqrt{x_1^2+x_2^2}\).

norm <- function(x1, x2) {
  sqrt(x1^2 + x2^2)
}

Then we call the function:

norm(1, 1)
[1] 1.414214

Write a function norm(x1,x2,x3) that calculates the Euclidean norm on \(\mathbb{R}^3\), \(||x|| = \sqrt{x_1^2+x_2^2+x_3^2}\). Calculate the norm of \([1,2,3]\).

norm <- function(x1, x2, x3) {
  sqrt(x1^2 + x2^2 + x3^2)
}

norm(1,2,3)
[1] 3.741657

Shorthands

Functions can also be written using a shorthand notation:

norm <- \(x, y) {
  sqrt(x^2 + y^2)
}

norm(1,1)
[1] 1.414214

Write a function norm(x1,x2,x3,x4) that calculates the Euclidean norm on \(\mathbb{R}^4\), \(||x|| = \sqrt{x_1^2+x_2^2+x_3^2+x_4^4}\). Use the shorthand notation. Calculate the norm of \([1,1,0,1]\).

norm <- \(x1, x2, x3, x4) {
  sqrt(x1^2 + x2^2 + x3^2 + x4^2)
}
norm(1,1,0,1)
[1] 1.732051

Functions on one line

When functions fit one on line you don’t need the curly braces.

norm <- function(x1, x2, x3) sqrt(x1^2 + x2^2 + x3^2)

Write the Euclidean norm \(||x|| = \sqrt{x_1^2+x_2^2+x_3^2+x_4^4}\) on one line, using the shorthand notation.

norm <- \(x1, x2, x3, x4) sqrt(x1^2 + x2^2 + x3^2 + x4^2)

\() vs function()

  • The \() notation is quite new, introduced in 2021.
  • Most R code uses function().
  • I personally prefer \(). It’s shorter! function is just too long.
  • There’s no difference between the choices. Use what you like.

Make a function that calculates the signal-to-noise ratio, defined as the mean divided by the standard deviation. (Hint: Find R functions that calculate the mean and standard deviation yourself!)

snr <- \(x) mean(x) / sd(x) 

Function arguments

Look at the norm function once more:

norm <- \(x1, x2, x3) sqrt(x1^2 + x2^2 + x3^2)

Here x1, x2, and x3 are the arguments of the function

What’s wrong with following function definition?

f <- \(a,b) {
  a + B
}

B is not an argument of the function – but b is!

f <- \(a,b) {
  a + b
}

Function composition

You can compose multiple function calls. For example, you can compute the biased sample variance using sqrt() and mean():

square <- \(x) x^2
deviation <- \(x) x - mean(x)

The biased sample variance equals

x <- mtcars$mpg
mean(square(deviation(x)))
[1] 35.18897

Exercise time

Let’s get comfy with functions through exercises.

Write a function summer that calculates the sum of the first \(n\) natural numbers, i.e., 1,2,\ldots,n.

summer <- \(n) sum(seq(n))
summer2 <- \(n) (n*(n+1))/2

Write a function hn that calculates the \(n\)th harmonic number, i.e., \(H_n=1/1 + 1/2 + 1/n\). (Hint: What happens to 1/seq(n)?)

hn <- \(n) sum(1/seq(n))

A famous approximation to the \(n\)th harmonic number states that \(H_n \approx \log(n)+\gamma\), where \(\gamma\) is the Euler–Mascheroni constant. (-digamma(1) in R).

Write a function h_n_approx that returns the approximation to \(H_n\). Verify it on n=1000.

hn_approx <- \(n) log(n) - digamma(1)
hn_approx(1000)
[1] 7.484971
hn(1000)
[1] 7.485471

Write a function is_awesome(x) that returns "LOL" if its input is "MrBeast" and "Boring!" otherwise. Test the function on "MrBeast" and "William Shakespeare".

is_awesome <- \(x) {
  if (x == "MrBeast") {
    "LOL"
  } else {
    "Boring!"
  }
}

is_awesome("MrBeast")
[1] "LOL"
is_awesome("William Shakespeare")
[1] "Boring!"

Loops and functions

Functions are often used to “hide” information from the user, such as loops.

We may sum the values of a vector as follows:

summer <- \(x) {
  adder = 0
  for (i in x) {
    adder = adder + i
  }
  adder
}
summer(1:20)
[1] 210
summer(5:11)
[1] 56

Using for loops, make a function maxi that returns the maximal value of a vector. Test it on x<-c(1,2,10,-1,0). You must use conditionals here.

maxi <- \(x) {
  current = -Inf
  for (i in x) if (i > current) current = i
  current
}
maxi(c(1,2,10,-1,0))
[1] 10

Missing number

Programming requires thinking.

We have the tools for solving the next exercise!

Suppose that x is a vector of numbers 1,2,...,n, but one number less than n is missing. Make a function get_missing that returns the missing number.

x <- c(1,2,4,5)
get_missing(x) # 3
x <- c(1,2,4,5)
get_missing <- \(x) sum(seq_len(max(x))) - sum(x)
get_missing(x)
[1] 3

Suppose that x is a vector of numbers 1,2,...,n, but with one number. Make a function get_missing2 that returns the missing number.

get_missing2 <- \(x) {
  if (max(x) == length(x)) {
    length(x) + 1
  } else {
    get_missing(x)
  }
}
get_missing2(c(1,2,4,5))
[1] 3
get_missing2(c(1,2,3,4,5))
[1] 6

Explicit returns

R returns the last calculated value of a function, but you can also return explicitly.

is_awesome <- \(x) {
  if (x == "MrBeast") {
    return("LOL")
  } 
  "Boring"
}

Write a function get_non_zero(x) that returns NULL if x is zero and 0 otherwise.

get_non_zero <- \(x) {
  if (x == 0) {
    return(NULL)
  }
  x
}

Function scope

The values defined in a function are private to that function – they belong to the function’s private scope.

What value of x is printed?

f <- \() {
  x <- 55
  x
}

x = 0
f()
print(x)

The solution is 0, as the x inside f belongs to that function’s scope.

What value will f(1, 2) return?

f <- \(a,b) {
  a + b
}

a <- 55

f returns 3, since a is an argument.

What value will f(1, 0) return? Or will it work at all?

f <- \(a, b) {
  a + B
}

B <- 2
f(1, 0)
[1] 1

The function f doesn’t use its second argument, but it does use B! It can use B because it exists in its defining environment. (Functions having this property are called closures.)

Functions in R

  • Writing your own functions is important.
  • But it’s even more important to become familiar with Rs built-in functions.
  • You need to look at the documentation (write ?fun in the terminal) and look up stuff at the internet.
  • Looking up the documentation is better!

Figure out what the following the functions rev(y) and sort(y) do. How do you sort y in decreasing order? Let y=c(4,6,2,1) for simplicity. (There are two ways to do this!)

More interactivity

More functions

Side-effects and pure functions

  • In mathematics a function \(f:X\to Y\) is a “rule” that sends an object in \(X\) to an object in \(Y\).

  • Such functions exist in R and other programming languages too; they are called pure.

  • But functions can do other stuff as well. Such “other stuff” is called side effects.

What does this function do?

f <- \(x) {
  print(paste0("I love ", x+1, "!"))
  x+3
}
2 + f(3)
[1] "I love 4!"
[1] 8

The function prints I love {x}! and returns x+3. Printing is called a side-effect

Closures

A function can access the variables in its defining environment, but will not modify them.

t <- 55
f <- \(x) {
  x + t
}
f(3)
[1] 58
t <- 55
f <- \(x) {
  print(paste0("Now t=", t, "."))
  t <- pi
  print(paste0("But now, t=", t, "!"))
  x
}
f(7)
[1] "Now t=55."
[1] "But now, t=3.14159265358979!"
[1] 7
t
[1] 55

Here f was able to access t=55 from its defining enviroment. But it can’t (easily!) modify the value of t in its calling environment.

What happens when we run the following code?

i <- 5
f <- \(i) {
  i <- i
  i
}
print(f(3))
print(i)

Function arguments always take precedence over values in the calling environment!

i <- 5
f <- \(i) {
  i <- i
  i
}
print(f(3))
[1] 3
print(i)
[1] 5

Using <<-

You can modify variables in the calling environment using functions. To do this, use <<-. This is very rarely useful!

x <- 55
f <- \(y) {
  x <<- y
}
print(x)
[1] 55
f(13)
print(x)
[1] 13

Returning functions

Take a look at the following function.

factory <- \(x) {
  adder <- \(y) x + y
  adder
}

What is this?

f <- factory(1)
f
function (y) 
x + y
<environment: 0x00000256e409fdf0>

Use the f to calculate the sum 1+2.

factory(1)(2)
[1] 3

Application

  • Recall the Newton–Raphson method from calculus.
  • We have a function \(f(x)\) and want to find the unique \(x\) where \(x=0\).
  • To to this, we use the iterates \(x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)}\), where \(f'\) is the derivative of \(f\).
  • We continue the iteration until \(|x_{i+1} - x_i|/|x_i| < \epsilon\) for some small \(\epsilon\).

The square root

  • The square root of a number \(a\) is the unique real solution to \(x^2 -a = 0\).
  • Let’s write a Newton–Raphson method for calculating the square root of \(2\).
f <- \(x) x^2 - 2
df <- \(x) 2*x
x0 <- 2
epsilon <- 10
while (epsilon > 0.000001) {
  x1 <- x0 - f(x0) / df(x0)
  epsilon = abs(x1-x0) / abs(x0)
  x0 <- x1
}
x0 - sqrt(2)
[1] 0

Make a function square_root(a, x0, eps) that calculates the square root of a using Newton–Raphson. (Here x0 is the starting value and epsilon is the desired accuracy.) Give x0 a default value of 1 and epsilon a default value of 0.000001.

square_root <- \(a, x0 = 1, eps = 0.000001) {
  f <- \(x) x^2 - a
  df <- \(x) 2*x
  epsilon <- 10
  while (epsilon > eps) {
    x1 <- x0 - f(x0) / df(x0)
    epsilon = abs(x1-x0) / abs(x0)
    x0 <- x1
  }
  x0
}

Documentation

  • Functions should always be documented.
  • There is a standardized way to document functions and other R objects.
  • It is called Roxygen. Documentation comments start with #'.
#' Estimate signal-to-noise ratio.
#' @param x Vector of observations.
#' @returns Estimated signal-to-noise ratio. 
snr <- \(x) {
  mean(x) / sd(x) 
}

Write a function mad2 that estimates the mean absolute deviation from the median. Write documentation for the function.

#' Estimate the mean absolute deviation from the median
#' @param x Vector of observations.
#' @returns The mean absolute deviation from the median.
mad <- \(x) mean(x - median(x))

Comments and documentation

  • Its common advice to place plenty of comments in your code.
  • I’d suggest you avoid comments as much as you can. Write documentation instead!

The ... arguments (dot-dot-dot)

Look at the documentation ?sum. The argument ... tells R the function takes an arbitrary number of arguments, named or not.

Optimization

  • Recall that the maximum likelihood estimator of a density family \(f(x;\theta)\) on the data \(x_1,\ldots,x_n\) is the \(\theta\) that maximimes \(\sum_{i=1}^n \log f(x_i;\theta)\).

  • The nlm function (non-linear minimization) does function minimization for you. It usually works well, but there are better, bespoke methods.

x <- univariateML::egypt$age
f <- \(p) -sum(dweibull(x, p[1], p[2], log = TRUE))
nlm(f, c(1, 1))$estimate
[1]  1.404159 33.563594

Compare to nlm solution above to the bespoke solution mlweibull in the univariateML package.

univariateML::mlweibull(x)
Maximum likelihood estimates for the Weibull model 
 shape   scale  
 1.404  33.564  

Summary

  1. Functions are defined using \() {} or function() {}.
  2. Document your functions using Roxygen.
  3. Most “natural” statistical functions are already implemented in R.
  4. You need to write loads of functions to get good at programming. Deliberate practice.